2,463 research outputs found

    Joint Modeling of Topics, Citations, and Topical Authority in Academic Corpora

    Full text link
    Much of scientific progress stems from previously published findings, but searching through the vast sea of scientific publications is difficult. We often rely on metrics of scholarly authority to find the prominent authors but these authority indices do not differentiate authority based on research topics. We present Latent Topical-Authority Indexing (LTAI) for jointly modeling the topics, citations, and topical authority in a corpus of academic papers. Compared to previous models, LTAI differs in two main aspects. First, it explicitly models the generative process of the citations, rather than treating the citations as given. Second, it models each author's influence on citations of a paper based on the topics of the cited papers, as well as the citing papers. We fit LTAI to four academic corpora: CORA, Arxiv Physics, PNAS, and Citeseer. We compare the performance of LTAI against various baselines, starting with the latent Dirichlet allocation, to the more advanced models including author-link topic model and dynamic author citation topic model. The results show that LTAI achieves improved accuracy over other similar models when predicting words, citations and authors of publications.Comment: Accepted by Transactions of the Association for Computational Linguistics (TACL); to appea

    Non-Linear Editor for Text-Based Screencast

    Full text link
    Screencasts, where computer screen is broadcast to a large audience on the web, are becoming popular as an online educational tool. Among various types of screencast content, popular are the contents that involve text editing, including computer programming. There are emerging platforms that support such text-based screencasts by recording every character insertion/deletion from the creator and reconstructing its playback on the viewer's screen. However, these platforms lack rich support for creating and editing the screencast itself, mainly due to the difficulty of manipulating recorded text changes; the changes are tightly coupled in sequence, thus modifying arbitrary part of the sequence is not trivial. We present a non-linear editing tool for text-based screencasts. With the proposed selective history rewrite process, our editor allows users to substitute an arbitrary part of a text-based screencast while preserving overall consistency of the rest of the text-based screencast.Comment: To appear in Adjunct Proceedings of the 30th Annual ACM Symposium on User Interface Software & Technology (UIST 2017, Poster

    Time-Aware Representation Learning for Time-Sensitive Question Answering

    Full text link
    Time is one of the crucial factors in real-world question answering (QA) problems. However, language models have difficulty understanding the relationships between time specifiers, such as 'after' and 'before', and numbers, since existing QA datasets do not include sufficient time expressions. To address this issue, we propose a Time-Context aware Question Answering (TCQA) framework. We suggest a Time-Context dependent Span Extraction (TCSE) task, and build a time-context dependent data generation framework for model training. Moreover, we present a metric to evaluate the time awareness of the QA model using TCSE. The TCSE task consists of a question and four sentence candidates classified as correct or incorrect based on time and context. The model is trained to extract the answer span from the sentence that is both correct in time and context. The model trained with TCQA outperforms baseline models up to 8.5 of the F1-score in the TimeQA dataset. Our dataset and code are available at https://github.com/sonjbin/TCQAComment: 2023 EMNLP Finding
    corecore